TREC 2003 Genomics Track Experiments at UTA: Query Expansion with Predefinded High Frequency Terms
نویسندگان
چکیده
We studied the effects of query expansion and query structure on retrieval performance. Two sets of words frequent in relevant documents for Genomics Track’s training topics were collected, the first manually and the second automatically. The high frequency words collected and the names of organisms designated in the test topics, were used as expansion keys in gene name queries formed from the final test topics. The results indicated that Boolean structured queries expanded with automatically collected high frequency words and names of organisms performed considerably better than queries containing gene names only as keys. In the Boolean queries the expansion keys were categorized based on the aspects they represent in the documents discussing gene function. All the structured queries performed better than unstructured queries where each key contributed equally to document weights. In the structured queries gene names were assigned more weight than the expansion keys.
منابع مشابه
Task-Specific Query Expansion (MultiText Experiments for TREC 2003)
I. INTRODUCTION For TREC 2003 the MultiText Project focused its efforts on the Genomics and Robust tracks. We also submitted passage-retrieval runs for the QA track. For the Genomics Track primary task, we used an amalgamation of retrieval and query expansion techniques, including tiering, term rewriting and pseudo-relevance feedback. For the Robust Track, we examined the impact of pseudo-relev...
متن کاملTREC 2004 Genomics Track Experiments at UTA: The Effects of Primary Keys, Bigram Phrases and Query Expansion on Retrieval Performance
We submitted runs for Genomics Track’s ad hoc retrieval task. The first official run (utaauto) was an automatic run and the second (utamanu) manual. For utaauto, the main features of query formulation were the removal of performative and marginally topical words from the topics based on average term frequency statistics, the removal of stop-words, the identification of bigram phrases, the weigh...
متن کاملTREC 2005 Genomics Track Experiments at IBM Watson
This paper describes our experiments in the TREC 2005 Genomics Track. For the ad-hoc retrieval task, we study synonym-based query expansion, as well as the effectiveness of a new pseudo-relevance feedback method which is derived from our recent work on semi-supervised learning. For the categorization task, we study various methods for estimating conditional class probability and determining the...
متن کاملApplication of Information Technology: Essie: A Concept-based Search Engine for Structured Biomedical Text
This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie's design is motivated by an observation that query terms are often conceptually related to terms in a document, without ...
متن کاملSymbol-Based Query Expansion Experiments at TREC 2005 Genomics Track
This paper illustrates the activity conducted at the TREC 2005 evaluation campaign in the ad-hoc task of the Genomics track. The retrieval effectiveness of a relevance feedback query expansion algorithm, which is based on symbols, is studied. The experimental results suggest that query expansion based on implicit relevance feedback is not always an effective means for improving effectiveness in...
متن کامل